NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A calibrated BISG for inferring race from surname and geolocation

https://doi.org/10.1093/jrsssa/qnaf003

Greengard, Philip; Gelman, Andrew (January 2025, Journal of the Royal Statistical Society Series A: Statistics in Society)

Abstract Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual’s geolocation and surname. Here we demonstrate that statistical dependence of surname and geolocation within racial/ethnic categories in the US results in biases for minority subpopulations, and we introduce a raking-based improvement. Our method augments the data used by BISG—distributions of race by geolocation and race by surname—with the distribution of surname by geolocation obtained from state voter files. We validate our algorithm on state voter registration lists that contain self-identified race/ethnicity.
more » « less
Hierarchical Bayesian models to mitigate systematic disparities in prediction with proxy outcomes

https://doi.org/10.1093/jrsssa/qnae142

Mikhaeil, Jonas M; Gelman, Andrew; Greengard, Philip (December 2024, Journal of the Royal Statistical Society Series A: Statistics in Society)

Abstract Label bias occurs when the outcome of interest is not directly observable and instead, modelling is performed with proxy labels. When the difference between the true outcome and the proxy label is correlated with predictors, this can yield systematic disparities in predictions for different groups of interest. We propose Bayesian hierarchical measurement models to address these issues. When strong prior information about the measurement process is available, our approach improves accuracy and helps with algorithmic fairness. If prior knowledge is limited, our approach allows assessment of the sensitivity of predictions to the unknown specifications of the measurement process. This can help practitioners gauge if enough substantive information is available to guarantee the desired accuracy and avoid disparate predictions when using proxy outcomes. We demonstrate our approach through practical examples.
more » « less
Full Text Available
The Piranha Problem: Large Effects Swimming in a Small Pond

https://doi.org/10.1090/noti3044

Tosh, Christopher; Greengard, Philip; Goodrich, Ben; Gelman, Andrew; Vehtari, Aki; Hsu, Daniel (January 2025, Notices of the American Mathematical Society)

Full Text Available
Fast Methods for Posterior Inference of Two-Group Normal-Normal Models

https://doi.org/10.1214/22-BA1329

Greengard, Philip; Hoskins, Jeremy; Margossian, Charles C; Gabry, Jonah; Gelman, Andrew; Vehtari, Aki (September 2023, Bayesian Analysis)

Full Text Available
Toward a taxonomy of trust for probabilistic machine learning

https://doi.org/10.1126/sciadv.abn3999

Broderick, Tamara; Gelman, Andrew; Meager, Rachael; Smith, Anna L.; Zheng, Tian (February 2023, Science Advances)

A taxonomy delineates where trust can break down in a probabilistic machine learning workflow that informs critical decisions.
more » « less
Full Text Available
Improving Multilevel Regression and Poststratification with Structured Priors

https://doi.org/10.1214/20-BA1223

Gao, Yuxiang; Kennedy, Lauren; Simpson, Daniel; Gelman, Andrew (September 2021, Bayesian Analysis)

Full Text Available
Accounting for uncertainty during a pandemic

https://doi.org/10.1016/j.patter.2021.100310

Zelner, Jon; Riou, Julien; Etzioni, Ruth; Gelman, Andrew (August 2021, Patterns)

Full Text Available
An Updated Dynamic Bayesian Forecasting Model for the US Presidential Election

https://doi.org/10.1162/99608f92.fc62f1e1

Heidemanns, Merlin; Gelman, Andrew; Morris, G. Elliott (December 2020, Harvard Data Science Review)

Full Text Available
Routine Hospital-based SARS-CoV-2 Testing Outperforms State-based Data in Predicting Clinical Burden

https://doi.org/10.1097/EDE.0000000000001396

Covello, Leonard; Gelman, Andrew; Si, Yajuan; Wang, Siquan (January 2021, Epidemiology)

Full Text Available
Bayesian hierarchical weighting adjustment and survey inference

Si, Yajuan; Trangucci, Rob; Gabry, Jonah; and Gelman, Andrew (December 2020, Survey methodology)

We combine weighting and Bayesian prediction in a unified approach to survey inference. The general principles of Bayesian analysis imply that models for survey outcomes should be conditional on all variables that affect the probability of inclusion. We incorporate all the variables that are used in the weighting adjustment under the framework of multilevel regression and poststratification, as a byproduct generating model-based weights after smoothing. We improve small area estimation by dealing with different complex issues caused by real-life applications to obtain robust inference at finer levels for subdomains of interest. We investigate deep interactions and introduce structured prior distributions for smoothing and stability of estimates. The computation is done via Stan and is implemented in the open-source R package rstanarm and available for public use. We evaluate the design-based properties of the Bayesian procedure. Simulation studies illustrate how the model-based prediction and weighting inference can outperform classical weighting. We apply the method to the New York Longitudinal Study of Wellbeing. The new approach generates smoothed weights and increases efficiency for robust finite population inference, especially for subsets of the population.
more » « less
Full Text Available

« Prev Next »

Search for: All records